Another comparison of two area-level socioeconomic deprivation indices: SVI (Social Vulnerability Index) and ADI (Area Deprivation Index) and their associations with asthma and smoking outcomes from survey data

BMIN503/EPID600 Final Project

Author

Amy Goodwin Davies

Published

November 28, 2023


Caution

This report is still under construction, but ready for peer feedback.

library(haven)
library(tidyverse)
library(readxl)
library(tigris)
library(tidycensus)
library(sf)
library(leaflet)
library(gridExtra)
library(RColorBrewer)
library(kableExtra)
library(broom.mixed)
library(modelsummary)
options(tigris_use_cache = TRUE)
options(progress_enabled = FALSE)
# from https://rdrr.io/github/bradisbrad/olfatbones/src/R/get_tri.R
#' Triangularize a Correlation Matrix
#' @description Takes a correlation matrix and removes the redundant information
#' @param cormat Correlation Matrix
#' @param lower Lower or upper half (default = T) select F for upper half
#'
#' @return Matrix
#' @export get_tri
#'
#' @examples
#' cormat <- round(cor(mtcars), 2)
#' get_tri(cormat)
#'
get_tri <- function(cormat, lower = T){
  if(lower){
    cormat[upper.tri(cormat)] <- NA
  } else {
    cormat[lower.tri(cormat)] <- NA
  }
  cormat
}

1 Overview

The goal of this project is to compare the Agency for Toxic Substances and Disease Registry Social Vulnerability Index (SVI) and the University of Wisconsin-developed Area Deprivation Index (ADI) in their association with asthma and smoking outcomes from the Behavioral Risk Factor Surveillance System (BRFSS) CDC dataset.

For SVI, the most recent available dataset is from 2020. For this project, national ranking at the county level will be used. The most recent ADI dataset is from 2021. National ranking for ADI is provided at the census block group level. For the asthma and smoking outcomes in the BRFSS CDC dataset, the 2021 Selected Metropolitan/Micropolitan Area Risk Trends (SMART) subset will used. The SMART BRFSS dataset provides the CBSA (Core-based statistical area), which “consist of the county or counties (or equivalent entities) associated with at least one core (urban area) of at least 10,000 population, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties.” https://www.census.gov/programs-surveys/metro-micro/about/glossary.html

2 Introduction

As Rollings et al 2023 discuss, there exist several area-level socioeconomic deprivation indices without consensus within healthcare research and policy fields about which indices should be used for a variety of analyses. In comparing two frequently used indices, SVI and ADI, in their associations with SMART BRFSS dataset, this project has the potential to inform selection of the appropriate index for future work. This project is interdisciplinary in that it spans data science, social science, and biomedical informatics.

A challenge for this project, as with many projects which incorporate geospatial variables, is differing geographic units. For this project, SVI is at the county level, ADI is at the census block group level, and the BRFSS outcomes are at the CBSA level. Fortunately, all of these geographic units are census-defined and are supersets or subsets of each other. The current plan is to link these datasets at the county level. Based on guidance from Dr. Sherrie Xie, ADI will be aggregated to the county-level using a weighted average that incorporates census counts for census block groups. The BRFSS smoking and asthma outcomes will be aggregated at the CBSA level. CBSAs and CBSA level smoking and asthma outcomes will be mapped to counties. An alternative approach uses weighted averages to aggregate both SVI and ADI to the CBSA level, which can also be explored.

Once the linked dataset is prepared, regression models will be used to compare SVI and ADI as predictors of the aggregated smoking and asthma outcomes. Spatial autoregressive regression, as used by Tipirneni et al 2022 will also be considered. For smoking, the smoking status variable _SMOKER3 will be aggregated at the CBSA level. For asthma, the variable for lifetime asthma prevalence _LTASTH1 will be aggregated at the CBSA level. More information about the smoking and asthma variables available in the SMART BRFSS dataset is provided in Section 7.

There are several limitations of the project. One clear limitation is that the analyses will be restricted to areas of the US which are included in a CBSA grouping included in the SMART BRFSS dataset.

3 Methods

3.1 SMART BRFSS CDC dataset

# MMSA2021.xpt retrieved from https://www.cdc.gov/brfss/smart/smart_2021.html
# Accompanying documentation available at https://www.cdc.gov/brfss/annual_data/2021/pdf/2021_smart_brfss_mmsa_methodology-508.pdf
MMSA2021 <- read_xpt("data/input_data/mmsa/MMSA2021_XPT/MMSA2021.xpt") |> 
  mutate(`_resp_id` = str_pad(row_number(), pad = 0, width = 6)) # add respondent id
MMSA2021 |> glimpse()
# construct table of variable names and labels
mmsa_colname_labels <- MMSA2021 %>%
  map_dfc(attr, "label") |>
  pivot_longer(everything(),
               names_to = "colname",
               values_to = "label")
# include variable names without labels
mmsa_colnames_all <- MMSA2021 |>
  colnames() |>
  as_tibble() |>
  rename(colname = value) |>
  left_join(mmsa_colname_labels, by = join_by(colname))
# add missing labels documented in https://www.cdc.gov/brfss/annual_data/2021/pdf/2021_smart_brfss_mmsa_methodology-508.pdf
attr(MMSA2021[["_MMSA"]], "label") <- toupper("the code of metropolitan or micropolitan statistical area where the respondent lives")
attr(MMSA2021[["MMSANAME"]], "label") <- toupper("the MMSA name")
attr(MMSA2021[["_MMSAWT"]], "label") <- toupper("the MMSA-level weight that is used when generating MMSA-level estimates for variables in the data set")
attr(MMSA2021[["_resp_id"]], "label") <- toupper("respondent ID")
mmsa_colnames_all <- MMSA2021 %>%
  map_dfc(attr, "label") |>
  pivot_longer(everything(),
               names_to = "colname",
               values_to = "label")
# rename columns to make them easier to work with using R (not requiring ``)
# underscore prefix indicates derived or computed, replace with d_
smart_2021 <- MMSA2021 |> 
  rename_with(~ tolower(str_replace(.x, "^_", "d_")))
smart_2021 |> colnames()
  [1] "dispcode"  "statere1"  "celphon1"  "ladult1"   "colgsex"   "landsex"  
  [7] "respslct"  "safetime"  "cadult1"   "cellsex"   "hhadult"   "sexvar"   
 [13] "genhlth"   "physhlth"  "menthlth"  "poorhlth"  "priminsr"  "persdoc3" 
 [19] "medcost1"  "checkup1"  "exerany2"  "bphigh6"   "bpmeds"    "cholchk3" 
 [25] "toldhi3"   "cholmed3"  "cvdinfr4"  "cvdcrhd4"  "cvdstrk3"  "asthma3"  
 [31] "asthnow"   "chcscncr"  "chcocncr"  "chccopd3"  "addepev3"  "chckdny2" 
 [37] "diabete4"  "diabage3"  "havarth5"  "arthexer"  "arthedu"   "lmtjoin3" 
 [43] "arthdis2"  "joinpai2"  "marital"   "educa"     "renthom1"  "numhhol3" 
 [49] "numphon3"  "cpdemo1b"  "veteran3"  "employ1"   "children"  "income3"  
 [55] "pregnant"  "weight2"   "height3"   "deaf"      "blind"     "decide"   
 [61] "diffwalk"  "diffdres"  "diffalon"  "smoke100"  "smokday2"  "usenow3"  
 [67] "ecignow1"  "alcday5"   "avedrnk3"  "drnk3ge5"  "maxdrnks"  "flushot7" 
 [73] "flshtmy3"  "imfvpla2"  "pneuvac4"  "hivtst7"   "hivtstd3"  "fruit2"   
 [79] "fruitju2"  "fvgreen1"  "frenchf1"  "potatoe1"  "vegetab2"  "d_ststr"  
 [85] "d_impsex"  "cageg"     "d_rfhlth"  "d_phys14d" "d_ment14d" "d_hlthpln"
 [91] "d_hcvu652" "d_totinda" "d_rfhype6" "d_cholch3" "d_rfchol3" "d_michd"  
 [97] "d_ltasth1" "d_casthm1" "d_asthms1" "d_drdxar3" "d_lmtact3" "d_lmtwrk3"
[103] "d_prace1"  "d_mrace1"  "d_hispanc" "d_race"    "d_raceg21" "d_racegr3"
[109] "d_raceprv" "d_sex"     "d_ageg5yr" "d_age65yr" "d_age80"   "d_age_g"  
[115] "wtkg3"     "d_bmi5"    "d_bmi5cat" "d_rfbmi5"  "d_educag"  "d_incomg1"
[121] "d_smoker3" "d_rfsmok3" "d_cureci1" "drnkany5"  "d_rfbing5" "d_drnkwk1"
[127] "d_rfdrhv7" "d_flshot7" "d_pneumo3" "d_aidtst4" "ftjuda2_"  "frutda2_" 
[133] "grenda1_"  "frnchda_"  "potada1_"  "vegeda2_"  "d_misfrt1" "d_misveg1"
[139] "d_frtres1" "d_vegres1" "d_frutsu1" "d_vegesu1" "d_frtlt1a" "d_veglt1a"
[145] "d_frt16a"  "d_veg23a"  "d_fruite1" "d_vegete1" "d_mmsa"    "d_mmsawt" 
[151] "seqno"     "mmsaname"  "d_resp_id"
smart_2021 |> glimpse()
# https://www.cdc.gov/brfss/annual_data/2021/pdf/2021_smart_brfss_mmsa_methodology-508.pdf
# Typically, BRFSS data are used to produce state-level estimates; however, for the SMART
# project, BRFSS data were used to produce small area-level estimates for MMSAs as defined by
# the US Census Bureau. On June 6, 2003, OMB issued new definitions for MMSA and
# metropolitan divisions. OMB periodically updates the list of MMSAs. The list of areas used for this
# analysis can be found here: https://www.census.gov/geographies/reference-files/timeseries/demo/metro-micro/delineation-files.html. The March 2020 release was used for defining
# the MMSAs in the 2021 SMART data set.
# retrieved from https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/historical-delineation-files.html
delin_2020 <- read_xls("data/input_data/mmsa/list1_2020.xls",
                       range = "A3:L1919") |>
  rename_with(~ tolower(gsub(" ", "_", .x, fixed = TRUE))) |> 
  rename_with(~ tolower(gsub("/", "_", .x, fixed = TRUE)))


delin_2020_cbsa_join <- delin_2020 |> filter(is.na(metropolitan_division_code)) # TODO explore why this is necessary
delin_2020_mdc_join <- delin_2020 |> filter(!is.na(metropolitan_division_code)) # TODO explore why this is necessary

# TODO check this
mmsa_county_mapping_cbsa <- smart_2021 |> 
  mutate(d_mmsa_char = as.character(d_mmsa)) |>
  distinct(d_mmsa, d_mmsa_char, d_mmsa_char, mmsaname) |> 
  inner_join(delin_2020_cbsa_join, by = c("d_mmsa_char" = "cbsa_code"))  |> 
  distinct(d_mmsa, d_mmsa_char, csa_title, county_county_equivalent, fips_state_code, fips_county_code)
mmsa_county_mapping_mdc <- smart_2021 |> 
  mutate(d_mmsa_char = as.character(d_mmsa)) |>
  distinct(d_mmsa, d_mmsa_char, d_mmsa_char, mmsaname) |> 
  inner_join(delin_2020_mdc_join, by = c("d_mmsa_char" = "metropolitan_division_code"))  |> 
  distinct(d_mmsa, d_mmsa_char, csa_title, county_county_equivalent, fips_state_code, fips_county_code)
mmsa_county_mapping <- mmsa_county_mapping_cbsa |> 
  dplyr::union(mmsa_county_mapping_mdc) |> 
  distinct()
smart_2021 |> summarize(n_distinct(d_mmsa)) |> pull() == mmsa_county_mapping |> summarize(n_distinct(d_mmsa)) |> pull()

# check for issues with leading zeros
smart_2021 |>
  filter(nchar(d_mmsa) != 5)
smart_2021 |>
  mutate(d_mmsa_char = as.character(d_mmsa)) |>
  filter(nchar(d_mmsa_char) != 5)
mmsa_county_mapping |>
  filter(nchar(d_mmsa_char) != 5)
delin_2020 |>
  filter(nchar(cbsa_code) != 5)
tigris_counties <- counties() 

mmsa_county_mapping_geoid <- mmsa_county_mapping |>
  mutate(smart = TRUE) |> 
  inner_join(tigris_counties, by = c("fips_state_code"= "STATEFP", "fips_county_code" = "COUNTYFP")) |>
  mutate(smart = replace_na(smart, FALSE))

mmsa_county_mapping_geoid |> filter(smart == TRUE) |> nrow() == mmsa_county_mapping |> nrow()
[1] TRUE
mmsa_county_mapping_geoid_sf <- mmsa_county_mapping_geoid |>
  filter(!fips_state_code %in% c("02",
                         "15", "60",
                         "66", "69",
                         "72", "78")) |> # limit to contiguous states and DC
  st_as_sf()

# simple map
# mmsa_county_mapping_geoid_sf |>
#   st_simplify(dTolerance = 1e3) |>
#   ggplot(aes(fill = smart)) +
#   geom_sf()
# Before we create the leaflet maps, we will change the CRS (Coordinate Reference System) to 4326 because leaflet expects the data provided to it to be in WGS84, which is EPSG:4326. We can do the transformation with st_transform.
# This is slow so I did it once and saved the RDS

mmsa_county_mapping_geoid_sf_4326 <- st_transform(mmsa_county_mapping_geoid_sf, crs = 4326)
mmsa_county_mapping_geoid_sf_4326 |> saveRDS("data/intermediate_data/mmsa_county_mapping_geoid_sf_4326.rds")
mmsa_county_mapping_geoid_sf_4326 <- readRDS("data/intermediate_data/mmsa_county_mapping_geoid_sf_4326.rds")
pal_fun <- colorFactor(topo.colors(2), mmsa_county_mapping_geoid_sf_4326$smart)

mmsa_county_mapping_geoid_sf_4326  |>
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_fun(smart),
    fillOpacity = 0.5,
    smoothFactor = 0.5,
    # increase opacity and resolution
    popup = mmsa_county_mapping_geoid_sf_4326$county_county_equivalent
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_fun,
    # palette function
    values = ~ smart,
    # variable to pass to palette function
    title = 'Included in SMART BRFSS dataset',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()

3.1.1 Aggregating smoking and asthma variables at CBSA level

smart_2021 <- smart_2021 |>
  mutate(
    d_ever_smoker = if_else(d_smoker3 %in% c(1, 2, 3), 1, 0),
    d_ever_asthma = if_else(d_ltasth1 == 2, 1, 0)
  )
3.1.1.1 _SMOKER3
# https://www.r-bloggers.com/2020/02/dem-7283-example-1-survey-statistics-using-brfss-data/
library(survey)
library(srvyr)
# https://stackoverflow.com/questions/55975478/problems-due-to-having-too-many-single-psus-at-stage-one
options(survey.adjust.domain.lonely=TRUE)
options(survey.lonely.psu="adjust")
# TODO check this
smart_2021_surv <- smart_2021 %>%
  as_survey_design(ids = 1,
                   strata = d_ststr,
                   weights = d_mmsawt)
# Investigate survey weighting use wtkg3 as can apply general expectations about distributions

# WEIGHT IN KG
# [2 implied decimal places]
smart_2021 %>%
  filter(!is.na(d_mmsawt)) |>
  ggplot(aes(x = wtkg3 / 100)) + # divide by 100 for 2 implied decimal places
  geom_density() +
  theme_minimal()

uw_wtkg3_mmsa_tbl <- smart_2021 %>%
  filter(!is.na(d_mmsawt)) |>
  group_by(d_mmsa, mmsaname) |>
  summarize(uw_wtkg3 = mean(wtkg3, na.rm = TRUE))

w_wtkg3_mmsa_tbl <- svyby( ~ wtkg3,
                           ~ d_mmsa,
                           design = smart_2021_surv,
                           FUN = svymean,
                           na.rm = TRUE) |> 
  rename(w_wtkg3 = wtkg3)
# TODO investigate "only one PSU at stage 1" warnings / confirm approach

wtkg3_mmsa_tbl <- w_wtkg3_mmsa_tbl |>
  inner_join(uw_wtkg3_mmsa_tbl, by = "d_mmsa") |>
  select(-se)

wtkg3_mmsa_tbl |>
  pivot_longer(cols = c("w_wtkg3", "uw_wtkg3"),
               values_to = "wtkg3") |>
  ggplot(aes(x = wtkg3 / 100, colour = name)) + # divide by 100 for 2 implied decimal places
  geom_density() +
  theme_minimal()

wtkg3_mmsa_tbl |>
  arrange(desc(w_wtkg3)) |>
  head(5)

wtkg3_mmsa_tbl |>
  arrange(w_wtkg3) |>
  head(5)

pal_fun_weight <- colorNumeric("BuPu", NULL) 

mmsa_county_mapping_geoid_sf_4326  |>
  left_join(wtkg3_mmsa_tbl, by = "d_mmsa") |> 
  mutate(weight_kg = w_wtkg3/100) |> 
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_fun_weight(weight_kg),
    fillOpacity = 0.5,
    smoothFactor = 0.5,
    # increase opacity and resolution
    popup = mmsa_county_mapping_geoid_sf_4326$county_county_equivalent
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_fun_weight,
    # palette function
    values = ~ weight_kg,
    # variable to pass to palette function
    title = 'Mean weight (kg)',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()
# TODO make function
uw_ever_smoker_mmsa_tbl <- smart_2021 %>%
  group_by(d_mmsa, mmsaname) |>
  summarize(uw_ever_smoker = mean(d_ever_smoker))

w_ever_smoker_mmsa_tbl <- svyby(
  ~ d_ever_smoker,
  ~ d_mmsa,
  design = smart_2021_surv,
  FUN = svymean,
  na.rm = TRUE) |>
  rename(w_ever_smoker = d_ever_smoker)

ever_smoker_mmsa_tbl <- w_ever_smoker_mmsa_tbl |>
  inner_join(uw_ever_smoker_mmsa_tbl, by = "d_mmsa") |>
  select(-se)
ever_smoker_mmsa_tbl |>
  arrange(desc(w_ever_smoker)) |> 
  head(5)

ever_smoker_mmsa_tbl |>
  arrange(w_ever_smoker) |> 
  head(5)
pal_fun_smoker <- colorNumeric("inferno", NULL) 

mmsa_county_mapping_geoid_sf_4326  |>
  left_join(ever_smoker_mmsa_tbl, by = "d_mmsa") |> 
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_fun_smoker(w_ever_smoker),
    fillOpacity = 0.5,
    smoothFactor = 0.5,
    # increase opacity and resolution
    popup = mmsa_county_mapping_geoid_sf_4326$county_county_equivalent
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_fun_smoker,
    # palette function
    values = ~ w_ever_smoker,
    # variable to pass to palette function
    title = 'Proportion smoker',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()
3.1.1.2 _LTASTH1
# TODO make function
uw_ever_asthma_mmsa_tbl <- smart_2021 %>%
  group_by(d_mmsa, mmsaname) |>
  summarize(uw_ever_asthma = mean(d_ever_asthma))

w_ever_asthma_mmsa_tbl <- svyby(
  ~ d_ever_asthma,
  ~ d_mmsa,
  design = smart_2021_surv,
  FUN = svymean,
  na.rm = TRUE) |>
  rename(w_ever_asthma = d_ever_asthma)

ever_asthma_mmsa_tbl <- w_ever_asthma_mmsa_tbl |>
  inner_join(uw_ever_asthma_mmsa_tbl, by = "d_mmsa") |>
  select(-se)
ever_asthma_mmsa_tbl |>
  arrange(desc(w_ever_asthma)) |> 
  head(5)

ever_asthma_mmsa_tbl |>
  arrange(w_ever_asthma) |> 
  head(5)
pal_fun_asthma <- colorNumeric("magma", NULL) 

mmsa_county_mapping_geoid_sf_4326  |>
  left_join(ever_asthma_mmsa_tbl, by = "d_mmsa") |> 
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_fun_asthma(w_ever_asthma),
    fillOpacity = 0.5,
    smoothFactor = 0.5,
    # increase opacity and resolution
    popup = mmsa_county_mapping_geoid_sf_4326$county_county_equivalent
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_fun_asthma,
    # palette function
    values = ~ w_ever_asthma,
    # variable to pass to palette function
    title = 'Proportion asthma',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()

3.2 SVI dataset

# retrieved from https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
# documentation here https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2020Documentation_08.05.22.pdf
svi2020 <- read_csv("data/input_data/svi/SVI_2020_US_county.csv")
svi2020 |> glimpse()
svi2020_sf <- svi2020 |>
  select(RPL_THEME1,
         RPL_THEME2,
         RPL_THEME3,
         RPL_THEME4,
         RPL_THEMES,
         FIPS) |>
  left_join(tigris_counties, by = c("FIPS" = "GEOID"))
  st_as_sf()

svi2020_sf_4326 <- st_transform(svi2020_sf, crs = 4326)
svi2020_sf_4326 |> saveRDS("data/intermediate_data/svi2020_sf_4326.rds")
svi2020_sf_4326 <-
  readRDS("data/intermediate_data/svi2020_sf_4326.rds")   |>
  filter(!STATEFP %in% c("02",
                         "15", "60",
                         "66", "69",
                         "72", "78"))  # limit to contiguous states and DC
pal_svi <- colorNumeric("inferno", NULL, reverse = TRUE)

svi2020_sf_4326  |>
  filter(!is.na(FUNCSTAT)) |> # TODO explore why this is necessary
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_svi(RPL_THEME1),
    fillOpacity = 0.5,
    smoothFactor = 0.5
    # increase opacity and resolution
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_svi,
    # palette function
    values = ~ RPL_THEMES,
    # variable to pass to palette function
    title = 'SVI',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()

3.3 ADI dataset

# retrieved from https://www.neighborhoodatlas.medicine.wisc.edu/
adi2021 <- read_csv("data/input_data/adi/adi-download/US_2021_ADI_Census_Block_Group_v4_0_1.csv")
tigris_cbg <- block_groups(cb = TRUE)  |>
  filter(!STATEFP %in% c("02",
                         "15", "60",
                         "66", "69",
                         "72", "78"))
adi2021 |> glimpse()
adi2021_sf <- adi2021 |>
  inner_join(tigris_cbg, by = c("FIPS" = "GEOID")) |>
  st_as_sf()

adi2021_cbg_sf_4326 <- st_transform(adi2021_sf, crs = 4326)
adi2021_cbg_sf_4326 |> saveRDS("data/intermediate_data/adi2021_cbg_sf_4326.rds")
adi_no_cbg <- adi2021 |>
  anti_join(tigris_cbg, by = c("FIPS" = "GEOID"))


cbg_no_adi <- tigris_cbg |>
  anti_join(adi2021, by = c("GEOID" = "FIPS"))

# When a Census block group falls into one or more of the suppression criteria mentioned above the ADI rank is replaced with a code describing the suppression reason. Three possible codes will appear in the ADI field: "PH" for suppression due to low population and/or housing, "GQ" for suppression due to a high group quarters population, and "PH-GQ" for suppression due to both types of suppression criteria. A code of "QDI" designates block groups without an ADI due to Questionable Data Integrity, stemming from missing data in the source ACS data.
contig_state_vctr <- datasets::state.abb |>
  setdiff(c("AK", "HI"))
# https://walker-data.com/umich-workshop-2022/intro-2020-census/#1
# get cbg counts to compute cbg weights for aggregating adi at county level
cbg_counts <- list()
for (contig_state in contig_state_vctr) {
  cbg_counts[[contig_state]] <- get_decennial(
    geography = "block group",
    variables = "P1_001N",
    state = contig_state,
    year = 2020,
    output = "wide"
  ) |>
    mutate(state_abb = contig_state)
}
cbg_counts |> saveRDS("data/intermediate_data/cbg_counts.RDS")
cbg_counts <- readRDS("data/intermediate_data/cbg_counts.RDS") |> 
  bind_rows()

# computed cbg weights for county level
cbg_wts <- tigris_cbg |> 
  select(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE, GEOID) |>
  st_drop_geometry() |> 
  inner_join(cbg_counts, by = "GEOID") |> 
  group_by(STATEFP, COUNTYFP) |>
  # county pop
  mutate(county_pop = sum(P1_001N)) |> 
  ungroup() |> 
  # cbg pop / county pop
  mutate(cbg_wt = P1_001N/county_pop) |>
  arrange(GEOID)

cbg_wts |> saveRDS("data/intermediate_data/cbg_wts.RDS")
cbg_wts <- readRDS("data/intermediate_data/cbg_wts.RDS")

# compute weighted average for adi at county level
county_adi <- adi2021_cbg_sf_4326 |>
  st_drop_geometry() |> 
  select(-STATEFP, -COUNTYFP, -TRACTCE, -BLKGRPCE, -NAME) |> 
  inner_join(cbg_wts, by = c("FIPS" = "GEOID")) |>
  mutate(ADI_NATRANK = as.numeric(ADI_NATRANK)) |>
  filter(!is.na(ADI_NATRANK)) |>
  # national adi rank * cbg wt
  mutate(adi_wts = ADI_NATRANK * cbg_wt) |>
  group_by(STATEFP, COUNTYFP) |>
  # weighted average sum of weighted adis
  summarize(county_adi_value = sum(adi_wts))

county_adi |> saveRDS("data/intermediate_data/county_adi.RDS")

# cbg_counts_comb <- cbg_counts |> 
#   bind_rows()
# cbg_counts_comb |> ggplot(aes(x = P1_001N)) + geom_histogram()
county_adi <- readRDS("data/intermediate_data/county_adi.RDS") |>
  mutate(FIPS = paste0(STATEFP, COUNTYFP)) |> 
  left_join(tigris_counties, by = c("FIPS" = "GEOID")) |> 
  st_as_sf()

adi2021_county_sf_4326 <- st_transform(county_adi, crs = 4326)
adi2021_county_sf_4326 |> saveRDS("data/intermediate_data/adi2021_county_sf_4326.rds")
adi2021_county_sf_4326 <- readRDS("data/intermediate_data/adi2021_county_sf_4326.rds")


pal_adi <- colorNumeric("magma", NULL)

adi2021_county_sf_4326  |>
  st_simplify(dTolerance = 1e3)  |>
  leaflet() |>
  addPolygons(
    stroke = FALSE,
    # remove polygon borders
    fillColor = ~ pal_adi(county_adi_value),
    fillOpacity = 0.5,
    smoothFactor = 0.5
    # increase opacity and resolution
  ) |>
  addProviderTiles(providers$CartoDB.Voyager) |> # add third party provider tile
  addLegend(
    "bottomright",
    # location of legend
    pal = pal_adi,
    # palette function
    values = ~ county_adi_value,
    # variable to pass to palette function
    title = 'adi',
    # legend title
    opacity = 1 # legend opacity (1 = completely opaque)
  ) |>
  addScaleBar()

3.4 Combining datasets

combined_dataset <- mmsa_county_mapping_geoid_sf_4326 |>
  select(d_mmsa, county_county_equivalent, fips_state_code, fips_county_code) |>
  left_join(select(ever_asthma_mmsa_tbl, -mmsaname), by = "d_mmsa") |> 
  left_join(select(ever_smoker_mmsa_tbl, -mmsaname), by = "d_mmsa") |>
  st_join(select(svi2020_sf_4326, starts_with("RPL")),
          left = TRUE,
          join = st_equals) |> 
  st_join(select(adi2021_county_sf_4326, county_adi_value),
          left = TRUE,
          join = st_equals)

combined_dataset |> saveRDS("data/final_data/combined_dataset.rds")
# add county counts for poisson regression
combined_dataset <- readRDS("data/final_data/combined_dataset.rds")
combined_dataset_no_geo <- combined_dataset |>
  st_drop_geometry()


county_cts <- get_decennial(
  geography = "county",
  variables = "P1_001N",
  year = 2020,
  output = "wide"
) |>
  rename(county_ct = P1_001N)

combined_dataset_w_counts <- combined_dataset_no_geo |>
  mutate(GEOID = paste0(fips_state_code, fips_county_code)) |>
  left_join(county_cts, by = "GEOID") |>
  mutate(
    n_w_ever_smoker = floor(w_ever_smoker * county_ct),
    n_uw_ever_smoker = floor(uw_ever_smoker * county_ct),
    n_w_ever_asthma = floor(w_ever_asthma * county_ct),
    n_uw_ever_asthma = floor(uw_ever_asthma * county_ct)
  )

combined_dataset_w_counts |> saveRDS("data/final_data/combined_dataset_w_counts.rds")

4 Results

4.1 Analyses

combined_dataset_w_counts <- readRDS("data/final_data/combined_dataset_w_counts.rds")

# benefited from approach here https://rpubs.com/cqj_00/785193
combined_corrs <- combined_dataset_w_counts |>
  select(
    w_ever_asthma,
    w_ever_smoker,
    RPL_THEME1,
    RPL_THEME2,
    RPL_THEME3,
    RPL_THEME4,
    RPL_THEMES,
    county_adi_value,
    county_ct
  ) |>
  cor(use = "complete.obs", method = "spearman") |>
  as.matrix() %>%
  get_tri() %>%
  reshape2::melt(na.rm = TRUE) |> 
  rename(variable_1 = Var1,
         variable_2 = Var2)

combined_corrs |>
  ggplot(aes(x = variable_2, y = variable_1, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(
    low = "blue",
    high = "red",
    mid = "grey100",
    # midpoint = 0.5,
    limit = c(-1, 0.999),
    space = "Lab",
    name = "Spearman\nCorrelation"
  ) +
  theme_minimal() + # minimal theme
  theme(
    axis.text.x = element_text(
      angle = 45,
      vjust = 1,
      size = 10,
      hjust = 1
    ),
    axis.text.y = element_text(size = 10)
  ) +
  coord_fixed() +
  coord_flip()

4.1.1 Correlations between SVI and ADI

svi_adi_model <-
  glm(county_adi_value ~ RPL_THEME1 + RPL_THEME2 + RPL_THEME3 + RPL_THEME4 + county_ct,
      family = gaussian(),
      data = combined_dataset_w_counts)


modelsummary(
  list(
    "svi_adi_model" = svi_adi_model
  ),
  coef_omit = "Intercept",
  statistic = c("p = {p.value}"),
  estimate = "{estimate} [{conf.low}, {conf.high}]"
)
svi_adi_model
RPL_THEME1 35.498 [30.280, 40.717]
p = <0.001
RPL_THEME2 25.911 [21.213, 30.609]
p = <0.001
RPL_THEME3 −40.294 [−44.688, −35.899]
p = <0.001
RPL_THEME4 4.416 [0.308, 8.525]
p = 0.035
county_ct 0.000 [0.000, 0.000]
p = <0.001
Num.Obs. 636
R2 0.606
AIC 4917.1
BIC 4948.3
Log.Lik. −2451.563
RMSE 11.42

4.1.2 Smoking

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_smoker)) +
  geom_density()  +
  theme_minimal()

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_smoker,
             y = county_adi_value)) +
  geom_point(alpha = 0.5) +
  theme_minimal()

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_smoker,
             y = RPL_THEME3)) +
  geom_point(alpha = 0.5) +
  theme_minimal()

# remove outliers
smoking_dataset_cleaned <- combined_dataset_w_counts |> 
  filter(w_ever_smoker >= 0.2)
# exploring modelling approaches..
# benefitted from https://rpubs.com/kaz_yos/poisson
smoking_quasibinom_adi_only <-
  glm(w_ever_smoker ~ county_adi_value,
      family = quasibinomial(),
      data = smoking_dataset_cleaned)

smoking_quasipoisson_adi_only <-
  glm(
    n_w_ever_smoker ~ county_adi_value,
    offset = log(county_ct),
    family = quasipoisson(link = "log"),
    data = smoking_dataset_cleaned
  )

smoking_quasibinom <-
  glm(w_ever_smoker ~ county_adi_value * RPL_THEME3,
      family = quasibinomial(),
      data = smoking_dataset_cleaned)

smoking_quasipoisson <-
  glm(
    n_w_ever_smoker ~ county_adi_value * RPL_THEME3,
    offset = log(county_ct),
    family = quasipoisson(link = "log"),
    data = smoking_dataset_cleaned
  )

modelsummary(
  list(
    "smoking_quasibinom" = smoking_quasibinom,
    "smoking_quasibinom_adi_only" = smoking_quasibinom_adi_only,
    "smoking_quasipoisson" = smoking_quasipoisson,
    "smoking_quasipoisson_adi_only" = smoking_quasipoisson_adi_only
  ),
  coef_omit = "Intercept",
  statistic = c("p = {p.value}"),
  estimate = "{estimate} [{conf.low}, {conf.high}]",
  exponentiate = TRUE
)
smoking_quasibinom smoking_quasibinom_adi_only smoking_quasipoisson smoking_quasipoisson_adi_only
county_adi_value 1.004 [1.002, 1.006] 1.006 [1.006, 1.007] 1.001 [1.000, 1.003] 1.005 [1.004, 1.005]
p = <0.001 p = <0.001 p = 0.096 p = <0.001
RPL_THEME3 0.645 [0.545, 0.764] 0.623 [0.556, 0.698]
p = <0.001 p = <0.001
county_adi_value × RPL_THEME3 1.003 [1.000, 1.006] 1.004 [1.001, 1.006]
p = 0.053 p = 0.001
Num.Obs. 634 634 634 634
RMSE 0.04 0.04 17935.23 22668.48

4.1.3 Asthma

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_asthma)) +
  geom_density()  +
  theme_minimal()

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_asthma,
             y = county_adi_value)) +
  geom_point(alpha = 0.5) +
  theme_minimal()

combined_dataset_w_counts |> 
  ggplot(aes(x = w_ever_asthma,
             y = RPL_THEME3)) +
  geom_point(alpha = 0.5) +
  theme_minimal()

# remove outliers
asthma_dataset_cleaned <- combined_dataset_w_counts
# exploring modelling approaches..
# benefitted from https://rpubs.com/kaz_yos/poisson
asthma_quasibinom <-
  glm(w_ever_asthma ~ county_adi_value * RPL_THEME3,
      family = quasibinomial(),
      data = asthma_dataset_cleaned)

asthma_quasipoisson <-
  glm(
    n_w_ever_asthma ~ county_adi_value * RPL_THEME3,
    offset = log(county_ct),
    family = quasipoisson(link = "log"),
    data = asthma_dataset_cleaned
  )

asthma_quasibinom_adi_only <-
  glm(w_ever_asthma ~ county_adi_value,
      family = quasibinomial(),
      data = asthma_dataset_cleaned)

asthma_quasipoisson_adi_only <-
  glm(
    n_w_ever_asthma ~ county_adi_value,
    offset = log(county_ct),
    family = quasipoisson(link = "log"),
    data = asthma_dataset_cleaned
  )

modelsummary(
  list(
    "asthma_quasibinom" = asthma_quasibinom,
    "asthma_quasibinom_adi_only" = asthma_quasibinom_adi_only,
    "asthma_quasipoisson" = asthma_quasipoisson,
    "asthma_quasipoisson_adi_only" = asthma_quasipoisson_adi_only
  ),
  coef_omit = "Intercept",
  statistic = c("p = {p.value}"),
  estimate = "{estimate} [{conf.low}, {conf.high}]",
  exponentiate = TRUE
)
asthma_quasibinom asthma_quasibinom_adi_only asthma_quasipoisson asthma_quasipoisson_adi_only
county_adi_value 1.000 [0.999, 1.002] 1.000 [1.000, 1.001] 0.999 [0.997, 1.001] 1.001 [1.000, 1.001]
p = 0.923 p = 0.254 p = 0.210 p = 0.028
RPL_THEME3 0.894 [0.775, 1.030] 0.736 [0.644, 0.842]
p = 0.120 p = <0.001
county_adi_value × RPL_THEME3 1.000 [0.998, 1.002] 1.002 [0.999, 1.005]
p = 0.917 p = 0.154
Num.Obs. 636 636 636 636
RMSE 0.02 0.02 9832.47 11178.78

5 Conclusion

6 References

Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Questionnaire. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2021.

Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Data. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2021.

Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index 2020 Database US. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html. Accessed on 2023/11/04.

Kind AJH, Buckingham W. Making Neighborhood Disadvantage Metrics Accessible: The Neighborhood Atlas. New England Journal of Medicine, 2018. 378: 2456-2458. DOI: 10.1056/NEJMp1802313. PMCID: PMC6051533.

Rollings KA, Noppert GA, Griggs JJ, Melendez RA, Clarke PJ. Comparison of two area-level socioeconomic deprivation indices: Implications for public health research, practice, and policy. PLoS One. 2023 Oct 5;18(10):e0292281. doi: 10.1371/journal.pone.0292281. PMID: 37797080; PMCID: PMC10553799.

Tipirneni R, Schmidt H, Lantz PM, Karmakar M. Associations of 4 Geographic Social Vulnerability Indices With US COVID-19 Incidence and Mortality. Am J Public Health. 2022 Nov;112(11):1584-1588. doi: 10.2105/AJPH.2022.307018. Epub 2022 Sep 15. PMID: 36108250; PMCID: PMC9558191.

University of Wisconsin School of Medicine and Public Health. 2021 Area Deprivation Index v4.0.1. Downloaded from https://www.neighborhoodatlas.medicine.wisc.edu/ 2023/11/04

Xie S, Himes BE. Approaches to Link Geospatially Varying Social, Economic, and Environmental Factors with Electronic Health Record Data to Better Understand Asthma Exacerbations. AMIA Annu Symp Proc. 2018 Dec 5;2018:1561-1570. PMID: 30815202; PMCID: PMC6371292.

Xie S, Hubbard RA, Himes BE. Neighborhood-level measures of socioeconomic status are more correlated with individual-level measures in urban areas compared with less urban areas. Ann Epidemiol. 2020 Mar;43:37-43.e4. doi: 10.1016/j.annepidem.2020.01.012. Epub 2020 Feb 11. PMID: 32151518; PMCID: PMC7160852.

7 Appendices

7.1 Session Info

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] srvyr_1.2.0         survey_4.2-1        survival_3.5-5     
 [4] Matrix_1.6-1.1      modelsummary_1.4.2  broom.mixed_0.2.9.4
 [7] kableExtra_1.3.4    RColorBrewer_1.1-3  gridExtra_2.3      
[10] leaflet_2.2.0       sf_1.0-14           tidycensus_1.5     
[13] tigris_2.0.4        readxl_1.4.3        lubridate_1.9.2    
[16] forcats_1.0.0       stringr_1.5.0       dplyr_1.1.2        
[19] purrr_1.0.2         readr_2.1.4         tidyr_1.3.0        
[22] tibble_3.2.1        ggplot2_3.4.3       tidyverse_2.0.0    
[25] haven_2.5.3        

loaded via a namespace (and not attached):
 [1] DBI_1.1.3               s2_1.1.4                rlang_1.1.1            
 [4] magrittr_2.0.3          furrr_0.3.1             e1071_1.7-13           
 [7] compiler_4.3.1          reshape2_1.4.4          systemfonts_1.0.4      
[10] vctrs_0.6.3             rvest_1.0.3             pkgconfig_2.0.3        
[13] wk_0.8.0                crayon_1.5.2            fastmap_1.1.1          
[16] backports_1.4.1         ellipsis_0.3.2          labeling_0.4.3         
[19] utf8_1.2.3              rmarkdown_2.24          tzdb_0.4.0             
[22] bit_4.0.5               xfun_0.40               jsonlite_1.8.7         
[25] uuid_1.1-1              broom_1.0.5             parallel_4.3.1         
[28] R6_2.5.1                tables_0.9.17           stringi_1.7.12         
[31] parallelly_1.36.0       estimability_1.4.1      jquerylib_0.1.4        
[34] cellranger_1.1.0        Rcpp_1.0.11             knitr_1.43             
[37] future.apply_1.11.0     parameters_0.21.2       leaflet.providers_2.0.0
[40] splines_4.3.1           timechange_0.2.0        tidyselect_1.2.0       
[43] rstudioapi_0.15.0       yaml_2.3.7              viridis_0.6.4          
[46] codetools_0.2-19        listenv_0.9.0           plyr_1.8.9             
[49] lattice_0.21-8          bayestestR_0.13.1       withr_2.5.1            
[52] coda_0.19-4             evaluate_0.21           future_1.33.0          
[55] units_0.8-4             proxy_0.4-27            xml2_1.3.5             
[58] pillar_1.9.0            KernSmooth_2.23-21      checkmate_2.2.0        
[61] DT_0.30                 insight_0.19.5          generics_0.1.3         
[64] vroom_1.6.3             hms_1.1.3               munsell_0.5.0          
[67] scales_1.2.1            xtable_1.8-4            globals_0.16.2         
[70] class_7.3-22            glue_1.6.2              emmeans_1.8.8          
[73] tools_4.3.1             webshot_0.5.5           mvtnorm_1.2-3          
[76] mitools_2.4             crosstalk_1.2.0         datawizard_0.9.0       
[79] colorspace_2.1-0        nlme_3.1-162            performance_0.10.5     
[82] cli_3.6.1               rappdirs_0.3.3          fansi_1.0.4            
[85] viridisLite_0.4.2       svglite_2.1.1           rematch_1.0.1          
[88] gt_0.9.0                gtable_0.3.4            digest_0.6.33          
[91] classInt_0.4-10         htmlwidgets_1.6.2       farver_2.1.1           
[94] htmltools_0.5.6         lifecycle_1.0.3         httr_1.4.7             
[97] MASS_7.3-60             bit64_4.0.5            

7.2 SMART BRFSS CDC dataset

mmsa_colnames_all |> 
  kable()
colname label
DISPCODE FINAL DISPOSITION
STATERE1 RESIDENT OF STATE
CELPHON1 CELLULAR TELEPHONE
LADULT1 ARE YOU 18 YEARS OF AGE OR OLDER?
COLGSEX ARE YOU MALE OR FEMALE?
LANDSEX ARE YOU MALE OR FEMALE?
RESPSLCT RESPONDENT SELECTION
SAFETIME SAFE TIME TO TALK?
CADULT1 ARE YOU 18 YEARS OF AGE OR OLDER?
CELLSEX ARE YOU MALE OR FEMALE?
HHADULT NUMBER OF ADULTS IN HOUSEHOLD
SEXVAR SEX OF RESPONDENT
GENHLTH GENERAL HEALTH
PHYSHLTH NUMBER OF DAYS PHYSICAL HEALTH NOT GOOD
MENTHLTH NUMBER OF DAYS MENTAL HEALTH NOT GOOD
POORHLTH POOR PHYSICAL OR MENTAL HEALTH
PRIMINSR WHAT IS PRIMARY SOURCE OF HEALTH INSURAN
PERSDOC3 HAVE PERSONAL HEALTH CARE PROVIDER?
MEDCOST1 COULD NOT AFFORD TO SEE DOCTOR
CHECKUP1 LENGTH OF TIME SINCE LAST ROUTINE CHECKU
EXERANY2 EXERCISE IN PAST 30 DAYS
BPHIGH6 EVER TOLD BLOOD PRESSURE HIGH
BPMEDS CURRENTLY TAKING BLOOD PRESSURE MEDICATI
CHOLCHK3 HOW LONG SINCE CHOLESTEROL CHECKED
TOLDHI3 EVER TOLD CHOLESTEROL IS HIGH
CHOLMED3 CURRENTLY TAKING MEDICINE FOR HIGH CHOLE
CVDINFR4 EVER DIAGNOSED WITH HEART ATTACK
CVDCRHD4 EVER DIAGNOSED WITH ANGINA OR CORONARY H
CVDSTRK3 EVER DIAGNOSED WITH A STROKE
ASTHMA3 EVER TOLD HAD ASTHMA
ASTHNOW STILL HAVE ASTHMA
CHCSCNCR (EVER TOLD) YOU HAD SKIN CANCER?
CHCOCNCR (EVER TOLD) YOU HAD ANY OTHER TYPES OF C
CHCCOPD3 EVER TOLD YOU HAD C.O.P.D. EMPHYSEMA OR
ADDEPEV3 (EVER TOLD) YOU HAD A DEPRESSIVE DISORDE
CHCKDNY2 EVER TOLD YOU HAVE KIDNEY DISEASE?
DIABETE4 (EVER TOLD) YOU HAD DIABETES
DIABAGE3 AGE WHEN TOLD DIABETES
HAVARTH5 TOLD HAVE ARTHRITIS
ARTHEXER DR. SUGGEST USE OF PHYSICAL ACTIVITY OR
ARTHEDU EVER TAKEN CLASS IN MANAGING ARTHRITIS O
LMTJOIN3 LIMITED BECAUSE OF JOINT SYMPTOMS
ARTHDIS2 DOES ARTHRITIS AFFECT WHETHER YOU WORK
JOINPAI2 HOW BAD WAS JOINT PAIN
MARITAL MARITAL STATUS
EDUCA EDUCATION LEVEL
RENTHOM1 OWN OR RENT HOME
NUMHHOL3 HOUSEHOLD TELEPHONES
NUMPHON3 RESIDENTIAL PHONES
CPDEMO1B DO YOU HAVE A CELL PHONE FOR PERSONAL US
VETERAN3 ARE YOU A VETERAN
EMPLOY1 EMPLOYMENT STATUS
CHILDREN NUMBER OF CHILDREN IN HOUSEHOLD
INCOME3 INCOME LEVEL
PREGNANT PREGNANCY STATUS
WEIGHT2 REPORTED WEIGHT IN POUNDS
HEIGHT3 REPORTED HEIGHT IN FEET AND INCHES
DEAF ARE YOU DEAF OR DO YOU HAVE SERIOUS DIFF
BLIND BLIND OR DIFFICULTY SEEING
DECIDE DIFFICULTY CONCENTRATING OR REMEMBERING
DIFFWALK DIFFICULTY WALKING OR CLIMBING STAIRS
DIFFDRES DIFFICULTY DRESSING OR BATHING
DIFFALON DIFFICULTY DOING ERRANDS ALONE
SMOKE100 SMOKED AT LEAST 100 CIGARETTES
SMOKDAY2 FREQUENCY OF DAYS NOW SMOKING
USENOW3 USE OF SMOKELESS TOBACCO PRODUCTS
ECIGNOW1 DO YOU NOW USE E-CIGARETTES, EVERY DAY,
ALCDAY5 DAYS IN PAST 30 HAD ALCOHOLIC BEVERAGE
AVEDRNK3 AVG ALCOHOLIC DRINKS PER DAY IN PAST 30
DRNK3GE5 BINGE DRINKING
MAXDRNKS MOST DRINKS ON SINGLE OCCASION PAST 30 D
FLUSHOT7 ADULT FLU SHOT/SPRAY PAST 12 MOS
FLSHTMY3 WHEN RECEIVED MOST RECENT SEASONAL FLU S
IMFVPLA2 WHERE DID YOU GET YOUR LAST FLU SHOT/VAC
PNEUVAC4 PNEUMONIA SHOT EVER
HIVTST7 EVER TESTED H.I.V.
HIVTSTD3 MONTH AND YEAR OF LAST HIV TEST
FRUIT2 HOW MANY TIMES DID YOU EAT FRUIT?
FRUITJU2 HOW MANY TIMES DID YOU DRINK 100 PERCENT
FVGREEN1 HOW MANY TIMES DID YOU EAT DARK GREEN VE
FRENCHF1 HOW OFTEN DO YOU EAT FRENCH FRIES OR FRI
POTATOE1 HOW OFTEN DO YOU EAT FRENCH FRIES OR FRI
VEGETAB2 HOW OFTEN DO YOU EAT FRENCH FRIES OR FRI
_STSTR SAMPLE DESIGN STRATIFICATION VARIABLE
_IMPSEX IMPUTED GENDER
CAGEG FOUR LEVEL CHILD AGE
_RFHLTH ADULTS WITH GOOD OR BETTER HEALTH
_PHYS14D COMPUTED PHYSICAL HEALTH STATUS
_MENT14D COMPUTED MENTAL HEALTH STATUS
_HLTHPLN HAVE ANY HEALTH INSURANCE
_HCVU652 RESPONDENTS AGED 18-64 WITH HEALTH INSUR
_TOTINDA LEISURE TIME PHYSICAL ACTIVITY CALCULATE
_RFHYPE6 HIGH BLOOD PRESSURE CALCULATED VARIABLE
_CHOLCH3 CHOLESTEROL CHECKED CALCULATED VARIABLE
_RFCHOL3 HIGH CHOLESTEROL CALCULATED VARIABLE
_MICHD RESPONDENTS THAT HAVE EVER REPORTED HAVI
_LTASTH1 LIFETIME ASTHMA CALCULATED VARIABLE
_CASTHM1 CURRENT ASTHMA CALCULATED VARIABLE
_ASTHMS1 COMPUTED ASTHMA STATUS
_DRDXAR3 RESPONDENTS DIAGNOSED WITH ARTHRITIS
_LMTACT3 LIMITED USUAL ACTIVITIES
_LMTWRK3 LIMITED WORK ACTIVITIES
_PRACE1 COMPUTED PREFERRED RACE
_MRACE1 CALCULATED NON-HISPANIC RACE INCLUDING M
_HISPANC HISPANIC, LATINO/A, OR SPANISH ORIGIN CA
_RACE COMPUTED RACE-ETHNICITY GROUPING
_RACEG21 COMPUTED NON-HISPANIC WHITES/ALL OTHERS
_RACEGR3 COMPUTED FIVE LEVEL RACE/ETHNICITY CATEG
_RACEPRV COMPUTED RACE GROUPS USED FOR INTERNET P
_SEX CALCULATED SEX VARIABLE
_AGEG5YR REPORTED AGE IN FIVE-YEAR AGE CATEGORIES
_AGE65YR REPORTED AGE IN TWO AGE GROUPS CALCULATE
_AGE80 IMPUTED AGE VALUE COLLAPSED ABOVE 80
_AGE_G IMPUTED AGE IN SIX GROUPS
WTKG3 COMPUTED WEIGHT IN KILOGRAMS
_BMI5 COMPUTED BODY MASS INDEX
_BMI5CAT COMPUTED BODY MASS INDEX CATEGORIES
_RFBMI5 OVERWEIGHT OR OBESE CALCULATED VARIABLE
_EDUCAG COMPUTED LEVEL OF EDUCATION COMPLETED CA
_INCOMG1 COMPUTED INCOME CATEGORIES
_SMOKER3 COMPUTED SMOKING STATUS
_RFSMOK3 CURRENT SMOKING CALCULATED VARIABLE
_CURECI1 CURRENT E-CIGARETTE USER CALCULATED VARI
DRNKANY5 DRINK ANY ALCOHOLIC BEVERAGES IN PAST 30
_RFBING5 BINGE DRINKING CALCULATED VARIABLE
_DRNKWK1 COMPUTED NUMBER OF DRINKS OF ALCOHOL BEV
_RFDRHV7 HEAVY ALCOHOL CONSUMPTION CALCULATED VA
_FLSHOT7 FLU SHOT CALCULATED VARIABLE
_PNEUMO3 PNEUMONIA VACCINATION CALCULATED VARIABL
_AIDTST4 EVER BEEN TESTED FOR HIV CALCULATED VARI
FTJUDA2_ COMPUTED FRUIT JUICE INTAKE IN TIMES PER
FRUTDA2_ COMPUTED FRUIT INTAKE IN TIMES PER DAY
GRENDA1_ COMPUTED DARK GREEN VEGETABLE INTAKE IN
FRNCHDA_ FRENCH FRY INTAKE IN TIMES PER DAY
POTADA1_ COMPUTED POTATO SERVINGS PER DAY
VEGEDA2_ COMPUTED OTHER VEGETABLE INTAKE IN TIMES
_MISFRT1 THE NUMBER OF MISSING FRUIT RESPONSES
_MISVEG1 THE NUMBER OF MISSING VEGETABLE RESPONSE
_FRTRES1 MISSING ANY FRUIT RESPONSES
_VEGRES1 MISSING ANY VEGETABLE RESPONSES
_FRUTSU1 TOTAL FRUITS CONSUMED PER DAY
_VEGESU1 TOTAL VEGETABLES CONSUMED PER DAY
_FRTLT1A CONSUME FRUIT 1 OR MORE TIMES PER DAY
_VEGLT1A CONSUME VEGETABLES 1 OR MORE TIMES PER D
_FRT16A REPORTED CONSUMING FRUIT >16/DAY
_VEG23A REPORTED CONSUMING VEGETABLES >23/DAY
_FRUITE1 FRUIT EXCLUSION FROM ANALYSES
_VEGETE1 VEGETABLE EXCLUSION FROM ANALYSES
_MMSA THE CODE OF METROPOLITAN OR MICROPOLITAN STATISTICAL AREA WHERE THE RESPONDENT LIVES
_MMSAWT THE MMSA-LEVEL WEIGHT THAT IS USED WHEN GENERATING MMSA-LEVEL ESTIMATES FOR VARIABLES IN THE DATA SET
SEQNO SEQUENCE NUMBER
MMSANAME THE MMSA NAME
_resp_id RESPONDENT ID

7.3 Asthma and smoking BRFSS variables

The table below summarizes the variables of interest, combining information from https://www.cdc.gov/brfss/annual_data/2021/summary_matrix_21.html and https://www.cdc.gov/brfss/annual_data/2021/pdf/2021-calculated-variables-version4-508.pdf

Output Variables
(In Final Data Set)
Description or Result
of Calculation
Values
_LTASTH1

CV for lifetime asthma prevalence / Calculated variable for adults who have ever been told they have asthma.

_LTASTH1 is derived from ASTHMA3.

1 No Respondents who have not been told by a doctor, nurse, or health professional that they had asthma.
2 Yes Respondents who have been told by a doctor, nurse, or health professional that they had asthma.
9 Don’t know/Not Sure or Refused/Missing Respondents who reported they did not know if they had been told by a doctor, nurse, or health professional that they had asthma, those who refused to answer if they had been told by a doctor, nurse. or health professional that they had asthma, or those with missing responses.
_CASTHM1

CV for current asthma prevalence / Calculated variable for adults who have been told they currently have asthma.

_CASTHM1 is derived from ASTHMA3 and ASTHNOW.

1 No Respondents who have not been told by a doctor, nurse, or health professional that they had asthma or do not still have asthma.
2 Yes Respondents who have been told by a doctor, nurse, or health professional that they had asthma and that they still have asthma.
9 Don’t know/Not Sure or Refused/Missing Respondents who reported they did not know if they had been told by a doctor, nurse, or health professional that they had asthma, those who refused to answer if they had been told by a doctor, nurse, or health professional that they had asthma, those who did not know if they still had asthma, those who refused to answer if they still had asthma, or those with missing responses.
_ASTHMS1

Computed asthma status / Calculated variable for computed asthma status.

_ASTHMS1 is derived from ASTHMA3 and ASTHNOW

1 Current Respondents who have been told by a doctor, nurse, or health professional that they had asthma and that they still have asthma.
2 Former Respondents who have been told by a doctor, nurse, or health professional that they had asthma but do not still have asthma.
3 Never Respondents who have not been told by a doctor, nurse, or health professional that they had asthma.
9 Don’t know/Not Sure or Refused/Missing Respondents who reported they didn’t know if they had been told by a doctor, nurse, or health professional that they had asthma, those who refused to answer if they had been told by a doctor, nurse, or health professional that they had asthma, those who didn’t know if they still had asthma, those that refused to answer if they still had asthma, or those with missing responses.
_SMOKER3

Smoking Status: Everyday smoker, someday smoker, former smoker and non smoker / Calculated variable for four-level smoker status: everyday smoker, someday smoker, former smoker,

non-smoker. _SMOKER3 is derived from SMOKE100 and SMOKDAY2.

1 Current smoker–now smokes every day Respondents who reported having smoked at least 100 cigarettes in their lifetime and now smoke every day.
2 Current smoker–now smokes some days Respondents who reported having smoked at least 100 cigarettes in their lifetime and now smoke some days.
3 Former smoker Respondents who reported having smoked at least 100 cigarettes in their lifetime and currently do not smoke.
4 Never smoked Respondents who reported they had not smoked at least 100 cigarettes in their lifetime.
9 Don’t know/Refused/Missing Respondents who reported they didn’t know if they had smoked 100 cigarettes in their lifetime, those who refused to answer if they had smoked 100 cigarettes in their lifetime, those who didn’t know if they now smoked every day, some days or not at all, those who refused to answer if they now smoked every day, some days or not at all, or those with missing responses.
_RFSMOK3 CV for current smoking status / _RFSMOK3 Calculated variable for adults who are current smokers. _RFSMOK3 is derived from _SMOKER3.
1 No Respondents who reported they had not smoked at least 100 cigarettes in their lifetime, those who reported having smoked 100 cigarettes in their lifetime but do not currently smoke.
2 Yes Respondents who reported having smoked at least 100 cigarettes in their lifetime and currently smoke.
9 Don’t know/Refused/Missing Respondents who reported they did not know if they had smoked 100 cigarettes in their lifetime, those who refused to answer if they had smoked 100 cigarettes in their lifetime, those who didn’t know if they now smoked every day, some days or not at all, those who refused to answer if they now smoked every day, some days or not at all, or those with missing responses.
_CURECI1

CV for current e-cigarette smoker status / Calculated variable for adults who are current e-cigarette users.

_CURECI1 is derived from ECIGNOW1.

1 Not currently using E-cigarettes Respondents who reported they had not used E-cigarettes in their lifetime, those who reported having used E-cigarettes in their lifetime but do not currently use E-cigarettes.
2 Current E-cigarette user Respondents who reported having used E-cigarettes in their lifetime and currently use E-cigarettes.
9 Don’t know/Refused/Missing Respondents who reported they did not know if they had used E-cigarettes in their lifetime, those who refused to answer if they had used E-cigarettes in their lifetime, those who didn’t know if they now used E-cigarettes every day, some days or not at all, those who refused to answer if they now used E-cigarettes every day, some days or not at all, or those with missing responses.